About

This data set was downloaded from kaggle. It was obtained through the Spotify API using the spotipy Python package, a light weight Python library for the Spotify API.

Each row represents a song. There are 16 columns and a total of 2016 observations.

There are 13 audio features that Spotify uses. These include danceability, loudness, energy and valence. A summary has been provided below. This has been taken from the Spotify API documentation summary.


The Data

KEY VALUE TYPE VALUE DESCRIPTION
acousticness float A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
danceability float Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
duration_ms int The duration of the track in milliseconds.
energy float Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
instrumentalness float Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
key int The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on.
liveness float Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
loudness float The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
mode int Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
speechiness float Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
tempo float The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
time_signature int An estimated overall time signature of a track. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure).
valence float A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
target int User tag indicating whether they liked or disliked the song (0 = dislike, 1 = like)
song_title string The song title
artist string The Artist

Cleaning the Data

While the data set was already clean and tidy, some additional cleaning had to be undertaken:

Creating factors of Mode, Key, Target

# refactor time signature - doesn't need to be ordered
df$time_signature <- factor(df$time_signature)

# refactor key, mode and target
key_labs = c('c', 'c#', 'd', 'd#', 'e', 'f', 
             'f#', 'g', 'g#', 'a', 'a#', 'b')
mode_labs = c('minor', 'major')
target_labs = c('dislike', 'like')

df <- df %>% mutate(key = factor(key, labels = key_labs))
df <- df %>% mutate(mode = factor(mode, labels = mode_labs))
df <- df %>% mutate(target = factor(target, labels = target_labs))

levels(df$key)
##  [1] "c"  "c#" "d"  "d#" "e"  "f"  "f#" "g"  "g#" "a"  "a#" "b"
levels(df$mode)
## [1] "minor" "major"
levels(df$target)
## [1] "dislike" "like"

Univariate Plots Section

##   acousticness        danceability     duration_ms          energy      
##  Min.   :0.0000028   Min.   :0.1220   Min.   :  16042   Min.   :0.0148  
##  1st Qu.:0.0096300   1st Qu.:0.5140   1st Qu.: 200015   1st Qu.:0.5630  
##  Median :0.0633000   Median :0.6310   Median : 229261   Median :0.7150  
##  Mean   :0.1875900   Mean   :0.6184   Mean   : 246306   Mean   :0.6816  
##  3rd Qu.:0.2650000   3rd Qu.:0.7380   3rd Qu.: 270333   3rd Qu.:0.8460  
##  Max.   :0.9950000   Max.   :0.9840   Max.   :1004627   Max.   :0.9980  
##                                                                         
##  instrumentalness         key         liveness         loudness      
##  Min.   :0.0000000   c#     :257   Min.   :0.0188   Min.   :-33.097  
##  1st Qu.:0.0000000   c      :216   1st Qu.:0.0923   1st Qu.: -8.394  
##  Median :0.0000762   g      :212   Median :0.1270   Median : -6.248  
##  Mean   :0.1332855   a      :191   Mean   :0.1908   Mean   : -7.086  
##  3rd Qu.:0.0540000   b      :187   3rd Qu.:0.2470   3rd Qu.: -4.746  
##  Max.   :0.9760000   d      :184   Max.   :0.9690   Max.   : -0.307  
##                      (Other):770                                     
##     mode       speechiness          tempo        time_signature
##  minor: 782   Min.   :0.02310   Min.   : 47.86   1:   1        
##  major:1235   1st Qu.:0.03750   1st Qu.:100.19   3:  93        
##               Median :0.05490   Median :121.43   4:1891        
##               Mean   :0.09266   Mean   :121.60   5:  32        
##               3rd Qu.:0.10800   3rd Qu.:137.85                 
##               Max.   :0.81600   Max.   :219.33                 
##                                                                
##     valence           target      song_title           artist         
##  Min.   :0.0348   dislike: 997   Length:2017        Length:2017       
##  1st Qu.:0.2950   like   :1020   Class :character   Class :character  
##  Median :0.4920                  Mode  :character   Mode  :character  
##  Mean   :0.4968                                                       
##  3rd Qu.:0.6910                                                       
##  Max.   :0.9920                                                       
## 

Tempo Summary

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   47.86  100.19  121.43  121.60  137.85  219.33

Key, Mode, Tempo and Time Signature are all common composition elements. They can be useful for describing and classifying music. They provide useful insights into a piece of music. Key and Mode for example are both descriptive qualities but when used in composition can invoke very different emotional responses.

Originally, the binwidth for Tempo was set to 100, which was too granular for an overall summary. A binwidth of 50 provides a much better indication of the common song tempos for the data.

It is interesting to note that the most common key in this data set is C# (which is the same as D♭) and the most common mode is the Major mode.

Christian Schubart’s descriptions from Ideen zu einer Aesthetik der Tonkunst (1806), translated here notes that:

C# major: ‘A leering key, degenerating into grief and rapture. It cannot laugh, but it can smile; it cannot howl, but it can at least grimace its crying. Consequently only unusual characters and feelings can be brought out in this key.’

C# minor: ‘Penitential lamentation, intimate conversation with God’

The least common key is D# (E♭ is the same)

D# minor: ‘Feelings of the anxiety of the soul’s deepest distress, of brooding despair, of blackest depression, of the most gloomy condition of the soul. Every fear, every hesitation of the shuddering heart, breathes out of horrible D# minor. If ghosts could speak, their speech would approximate this key.’

D# major: ‘The key of love, of devotion, of intimate conversation with God.’

Regarding Time Signature, 4/4 meter (denoted as 4 in this dataset) is the most common time signature used in music, it has even been given it’s own name because of this - Common Time. It is therefore, not surprising that the majority of songs in this data set are in common time.

The above plots provide a summary of Spotify’s song features. A Higher binwidth (100) has been chosen to provide a more granular description of the data.

The majority of the Danceable tracks fall between 0.5 and 0.7. Energy peaks around 0.8 and Valence is quite symmetrical with a gradual slope and curve. Loudness peaks between -10 and 0dB. The majority of this dataset contains observations with a very low liveness rating, indicating that the majority of songs were studio recorded and not live.

Danceability and Valence are both bell shaped while the rest of the distributions are all heavily skewed either left or right.

Univariate Analysis

What is the structure of your dataset?

There are 2016 observations in this data set with 16 columns.

There are 14 key features

  • acousticness - Confidence measure from 0:not acoustic, 1:acoustic.

  • danceability - How danceable the track is - 0.0:least danceable to 1.0:most danceable.

  • duration_ms - The duration of the track in milliseconds.

  • energy - Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity.

  • instrumentalness - A Likelihood measure predicting whether a track contains. 0:No vocals, 1:Vocal.

  • key - The key of the track.

  • liveness - Detects the presence of an audience in the recording. A value above 0.8 provides strong likelihood that the track is live.

  • loudness - The overall loudness of a track in decibels (dB).

  • mode - Mode indicates the modality (major or minor) of a track.

  • speechiness - Speechiness detects the presence of spoken words in a track. Values below 0.33 most likely represent music and other non-speech-like tracks.

  • tempo - The overall estimated tempo of a track in beats per minute (BPM).

  • time_signature - An estimated overall time signature of a track.

  • valence - A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. 0:less positive, 1:more positive

  • target - user tag 0:dislike, 1:like

What is/are the main feature(s) of interest in your dataset?

I am mainly interested in the relationship between some of Spotify’s song attributes. I’d like to explore potential relationships between speechness, energy, danceability, loudness and valence. The data includes a target variable which was generated by the creator of the data set, it is a measure of whether he/she liked or disliked a track.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

I think the categorical variables, Key, Mode and Target, will support my investigation by revealing interesting insights regarding the structural components of the tracks and how they might impact on some of Spotify’s song features.

Did you create any new variables from existing variables in the dataset?

I did not create any new variables.

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

I created factor levels for the Mode, Key and Target variables. This was mainly done to make the data easier to investigate. For anyone with some musical knowledge, it would be easier to follow the investigation if they could see key and mode names (i.e. C#, Db, Major, Minor) rather than having to constantly refer back to the features table to know that the key of C was represented as 0 in the data. It was also useful for keeping labels consistent when plotting.

Bivariate Plots Section

There seems to be a medium to strong correlation between Loudness and Energy, and potentially between Danceability and Valence. I think there are too many variables here so I want to remove Liveness and Speechiness. I don’t think they are particularly strong measures. I’d like to add in tempo along with some correlation measures using the GGally package.

The correlation measures confirm that Loudness and Energy appear to have a fairly strong relationship (0.762). Looking at Tempo and Energy is interesting because the correlation coefficient doesn’t indicate a strong relationship, but it seems logical and clear from the plot that the most energetic tracks sit around 120bpm (beats per minute). This is a common tempo known as Allegro in classical terms which ranges from 120–156 bpm. Modern popular music tempos for a number of different genres also sit around this range.

  • Techno generally 120-135 bpm
  • House varies between 118 - 135 bpm
  • Hip Hop around 80-115 bpm

The average tempo for this data set is 121.60 bpm.

A much simpler way to visualise the correlations is with the ggcorr function from the GGally package.

I wanted to include this plot as well as the ggpairs because they both provide different ways of visualising the correlations. The ggcorr plot provides a quick summary, highlighting values of interest. While the ggpairs plot adds the Loess smoothing, highlighting the shape of any potential relationships.

The boxplots for Key and Tempo above, provide clear indication of the consistency of mean tempos across all keys.

We know from the univariate plots that the most popular key in the data set is C#. I want to subset the data and check some of the relationships of song attributes specifically in this key.

Bivariate Analysis

Again we can see a positive relationship between Energy and Loudness in the key of C#. The listener likes music that sits between the -10 to -5Db range, peaking around -5Db.

The distribution in the Energy plot above for ‘Liked/Disliked’ tracks is bimodal, peaking around 0.55 and 0.80.

Energy Summary

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0288  0.5950  0.7500  0.7119  0.8620  0.9920

I’d like to take a closer look at the correlations between some of the key features.

correlation

correlation

When plotted, the relationship for Valence and Danceability is fairly spread out, but it is possible to identify a trend in the data.

## 
##  Pearson's product-moment correlation
## 
## data:  df$valence and df$danceability
## t = 22.123, df = 2015, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4062534 0.4765128
## sample estimates:
##       cor 
## 0.4420609

The distribution of Energy and Danceability is clustered between 0.5 and 1.0 on the x axis. The correlation suggests that the relationship is quite weak at 0.0386.

## 
##  Pearson's product-moment correlation
## 
## data:  df$energy and df$danceability
## t = 1.7321, df = 2015, p-value = 0.08342
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.00509766  0.08206440
## sample estimates:
##        cor 
## 0.03855671

Again, the data is clustered heavily around -5 Db on the x axis and between 0.5 and 0.75 on the y. Again, a very weak relationship of 0.1044.

## 
##  Pearson's product-moment correlation
## 
## data:  df$loudness and df$danceability
## t = 4.7104, df = 2015, p-value = 2.641e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.06099363 0.14733628
## sample estimates:
##       cor 
## 0.1043616

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

The two variables with the second strongest relationship were Valence and Danceability. Valence is an estimate of sentiment - positive/happy vs sad/negative. I would have thought that a stronger relationship between the two variables would be present.

I wanted to check the correlation between Loudness, Valence and Energy when compared to Danceability. Running the cor.test function on these variables gave the following results:

Variables Correlation
Valence/Danceability 0.4420609
Energy/Danceability 0.03855671
Loudness/Danceability 0.1043616

Aside from Valence and Danceability, which show a weak-moderate relationship, no strong correlations were present.

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

The bimodal distribution of energy when subset by the Like/Dislike variable of Target was interesting. Overall, when looking at the distribution of Energy it is left-tailed, with the mass of the distribution concentrated on the right hand side. But when split by Target, we can see two definite peaks in the data.

What was the strongest relationship you found?

The strongest relationship was between Energy and Loudness. I would have thought that Danceability would have a similarly strong relationship because the three variables could be assumed to be related - If a track had a high Energy rating, it would seem plausible that the Danceability rating too would increase, however, out of the 3 variables, this had the lowest correlation (0.0386).

Multivariate Plots Section

I’d like to re-introduce the key into the variables to see if there is a relationship between Energy, Valence and Danceability and whether this is reflected in the listeners like/dislike rating.

Like’s Summary

summary(df_like)
##   danceability        energy          valence          loudness      
##  Min.   :0.1220   Min.   :0.0310   Min.   :0.0359   Min.   :-25.756  
##  1st Qu.:0.5535   1st Qu.:0.5720   1st Qu.:0.3220   1st Qu.: -8.829  
##  Median :0.6705   Median :0.7080   Median :0.5300   Median : -6.948  
##  Mean   :0.6465   Mean   :0.6898   Mean   :0.5232   Mean   : -7.353  
##  3rd Qu.:0.7672   3rd Qu.:0.8323   3rd Qu.:0.7170   3rd Qu.: -5.306  
##  Max.   :0.9620   Max.   :0.9890   Max.   :0.9920   Max.   : -0.307  
##                                                                      
##       key         mode         target    
##  c#     :117   minor:431   dislike:   0  
##  d      :109   major:589   like   :1020  
##  c      :107                             
##  g      :107                             
##  a      :104                             
##  b      : 98                             
##  (Other):378

Dislike’s Summary

summary(df_dislike)
##   danceability        energy          valence          loudness      
##  Min.   :0.1520   Min.   :0.0148   Min.   :0.0348   Min.   :-33.097  
##  1st Qu.:0.4870   1st Qu.:0.5490   1st Qu.:0.2620   1st Qu.: -7.577  
##  Median :0.5980   Median :0.7230   Median :0.4660   Median : -5.535  
##  Mean   :0.5896   Mean   :0.6731   Mean   :0.4698   Mean   : -6.812  
##  3rd Qu.:0.6970   3rd Qu.:0.8610   3rd Qu.:0.6550   3rd Qu.: -4.251  
##  Max.   :0.9840   Max.   :0.9980   Max.   :0.9740   Max.   : -0.787  
##                                                                      
##       key         mode         target   
##  c#     :140   minor:351   dislike:997  
##  c      :109   major:646   like   :  0  
##  g      :105                            
##  b      : 89                            
##  a      : 87                            
##  f      : 83                            
##  (Other):384

The boxplots for Valence, Energy and Danceability are interesting when split by the Target and Key variables. Overall, these plots indicate that in the majority of keys, the listener prefers tracks with higher average ratings across each measure. The means indicated by summary statistics further support this. There are a few outliers present for different keys.

I’d like to take a look at the different variables by splitting the data into two parts - LIKE and DISLIKE, then taking a look at the correlation between Valence, Energy, Loudness and Danceability to find the strongest correlation.

## 
##  Pearson's product-moment correlation
## 
## data:  x_like_val and y_like_dance
## t = 9.8526, df = 1018, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2379799 0.3500935
## sample estimates:
##       cor 
## 0.2950519

## 
##  Pearson's product-moment correlation
## 
## data:  x_dis_val and y_dis_dance
## t = 22.653, df = 995, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5408087 0.6228392
## sample estimates:
##       cor 
## 0.5833094

## 
##  Pearson's product-moment correlation
## 
## data:  x_like_loud and y_like_en
## t = 26.976, df = 1018, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6083671 0.6800696
## sample estimates:
##       cor 
## 0.6456392

## 
##  Pearson's product-moment correlation
## 
## data:  x_dis_loud and y_dis_en
## t = 46.603, df = 995, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8075684 0.8466863
## sample estimates:
##      cor 
## 0.828133

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

The Target variable seemed to strengthen the correlation between some of the key features I decided to look at (Loudness, Energy, Valence and Danceability).

Subsetting the data by Key unfortunately created very small sample sizes so I don’t think that I could find anything too meaningful by investigating down at this level.

Were there any interesting or surprising interactions between features?

The strongest relationship from the Likes data set was between Loudness and Energy. However, I found it very interesting that the Dislike data set produced stronger relationships between all four of the above variables. Again Energy and Loudness were the strongest with a result of 0.8281.

This does provide an insight into the preference of the listener. They seem to prefer songs with a more positive sentiment, demonstrated by the means being higher across the majority of keys. The same can be said about Danceability.


Final Plots and Summary

Plot One

Description One

The correlation between the Loudness and Energy ratings were the strongest. Originally, this plot was looked at in a few different ways - first, overall, then by subsetting the data by the most common key signature (C#), then by subsetting the data by the Target(like/dislike) variable. Subsetting by Key created a small sample size but the relationship was still moderately positive. The above plot is a convenient way of summarising the relationship between Loudness and Energy based on the listener’s preference. It immediately shows that the relationship between the variables is actually stronger in the songs the listener disliked.

Plot Two

Description Two

This plot takes the second matrix plot and adds the Target variable in to check the relationships between variables based on whether the user liked or disliked the songs. This would significantly speed up the EDA process by providing a quick summary of multiple variables in one plot. Along with a visual representation, the correlation statistics are also available in the upper section of the plot.

Plot Three

Description Three

This plot appeared in the Bivariate section of the report. The simple set of boxplots provided summary statistics of the Tempo variable for each Key. I wanted to get more information from this by subsetting the plot by the musical Mode (major/minor). In doing so, we are able to see that average the tempo remains consistent across the majority of keys AND modes (f major and a# minor being the exception). The Tufte style boxplot was chosen for design purposes. The original plot was becoming crowded with the addition of the mode variable.


Reflection

This data set provided interesting insights into Spotify’s classification ratings. Until I found this data set, I was unaware of the kinds of measures they attributed to music to help them create user playlists such as their ‘Discover Weekly’ or even their Artist Radio stations that are built around the music that users like to listen to. This data set provided interesting insights into one users preference and by looking at the features in a number of different ways it was possible to identify music in particular keys and modes that they preferred, or that they prefer music with higher positive sentiment (Valence) ratings.

The data set wasn’t overly large, and this caused some issues when subsetting by key, often, sample sizes were smaller that 100 observations (when split into like/dislike datasets). I would like to further explore the musical keys to see if correlation between some of the features increase or decrease based on the song key.

I also found that the correlation between some variables weren’t as strong as I would have thought - like Energy and Danceability for example. In the Bivariate section of the analysis, I found that the distribution of Energy when subset by the key of C# was bimodal, however, this was a deceptive visual effect caused by the grid.arrange function, which compressed the heights of the charts. This accentuated the peaks in the plot, exaggerating them to appear bimodal, when in fact, this was not the case.

This dataset is inherently biased because it has been collected by one user from playlists that he/she has created. You would assume that someone wouldn’t put a song they didn’t like at some point into a playlist. This could account for the similarity in the counts for the number of liked/disliked songs. So then, how robust is the Target variable? Should it be a 5 point scale instead? I also found that it was hard to find any really strong linear relationships. One explanation I had for this is that the variables realistically only have a short range. Take Tempo for example; if a song is played too fast it becomes messy and unlistenable, and you couldn’t dance to it. Loudness is the same; too loud and you can damage your hearing, too soft and it is inaudible. A larger dataset that included many listeners preferences could influence the relationships in a different.

Thinking broader, this type of analysis can provide interesting insights for not only Spotify (who no doubt already do this!) but for record labels, advertising and/or marketing agencies. A listener could be targeted for specific campaigns based on their musical preferences. Advertising for new release music, tours, festivals and even products could be targeted to more specific demographics based on this kind of analysis. Coupling this dataset with additional information such as age, gender, location and even play counts would be increasingly more powerful.

In the future, I would be interested in building a machine learning model that would attempt to predict a listener’s taste in music based on the features of the music they currently enjoy. The listener’s preference of particular song keys or modes, coupled with features like Energy, Valence etc could be used as variables in the model. This would be useful again for targeted advertising but could also provide interesting insights into the musical preferences of a specific demographic. This could also be related back to Christian Schubart’s descriptions of keys from Ideen zu einer Aesthetik der Tonkunst, to see if modern music still holds true to these based on Spotify’s classifications i.e. does the key of E♭ minor have lower Valence and Danceability ratings?